我们提出了一个令人尴尬的简单点注释方案,以收集弱监督,例如分割。除了边界框外,我们还收集了在每个边界框内均匀采样的一组点的二进制标签。我们表明,为完整的掩模监督开发的现有实例细分模型可以通过我们的方案收集基于点的监督而无缝培训。值得注意的是,接受了可可,Pascal VOC,CityScapes和LVI的面具R-CNN,每个物体只有10个带注释的随机点可实现94% - 占其完全监督的性能的98%,为弱化的实例细分定下了强大的基线。新点注释方案的速度比注释完整的对象掩码快5倍,使高质量实例分割在实践中更容易访问。受基于点的注释形式的启发,我们提出了对Pointrend实例分割模块的修改。对于每个对象,称为隐式pointrend的新体系结构生成一个函数的参数,该函数可以使最终的点级掩码预测。隐式Pointrend更加简单,并使用单点级掩蔽丢失。我们的实验表明,新模块更适合基于点的监督。
translated by 谷歌翻译
In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians).The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity.To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VG-GFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on the face recognition of IJB datasets, exceeding the previous state-of-the-art by a large margin. The dataset and models are publicly available 1 .
translated by 谷歌翻译
Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.
translated by 谷歌翻译
Pre-training large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. Although this method has proven to be effective for many domains, it might not always provide desirable benefits. In this paper, we study the effects of hateful pre-training on low-resource hate speech classification tasks. While previous studies on the English language have emphasized its importance, we aim to augment their observations with some non-obvious insights. We evaluate different variations of tweet-based BERT models pre-trained on hateful, non-hateful, and mixed subsets of a 40M tweet dataset. This evaluation is carried out for the Indian languages Hindi and Marathi. This paper is empirical evidence that hateful pre-training is not the best pre-training option for hate speech detection. We show that pre-training on non-hateful text from the target domain provides similar or better results. Further, we introduce HindTweetBERT and MahaTweetBERT, the first publicly available BERT models pre-trained on Hindi and Marathi tweets, respectively. We show that they provide state-of-the-art performance on hate speech classification tasks. We also release hateful BERT for the two languages and a gold hate speech evaluation benchmark HateEval-Hi and HateEval-Mr consisting of manually labeled 2000 tweets each. The models and data are available at https://github.com/l3cube-pune/MarathiNLP .
translated by 谷歌翻译
无人驾驶飞机在当天变得越来越流行,对它们的申请越过科学和工业的界限,从航空摄影到包装交付再到灾难管理,从该技术中受益。但是在它们变得司空见惯之前,要解决的挑战要使它们可靠和安全。以下论文讨论了与无人驾驶飞机的精确着陆相关的挑战,包括传感和控制的方法及其在各种应用中的优点和缺点。
translated by 谷歌翻译
随着无人机技术的改进,从监视到航空摄影再到包装交付的这些多功能自动驾驶汽车,已经发现了越来越多的用途,并且这些应用都带来了独特的挑战。本文实施了一个这样一个挑战的解决方案:降落在移动目标上。此问题以前已经通过不同程度的成功解决了,但是大多数实施都集中在室内应用程序上。室外以风和照明等变量的形式提出了更大的挑战,室外无人机更重,更容易受到惯性效应的影响。我们的方法纯粹是基于视觉的,使用单眼摄像机和基准标记来定位无人机和PID控制,以跟随和降落在平台上。
translated by 谷歌翻译
阅读和驾驶等日常任务的核心是主动对象识别。目前无法合并时间来阻碍建模此类任务的尝试。人们在速度和准确性之间表现出灵活的权衡,而这种权衡是至关重要的人类技能。深层神经网络已成为预测人类对象识别峰值和神经活动的有前途的候选人。但是,建模时间维度,即速度准确性权衡(SAT),对于它们作为人类如何识别对象的有用计算模型至关重要。为此,我们在这里介绍了第一个大规模(148个观察者,4个神经网络,8个任务)数据集,该数据集是识别Imagenet图像时速度准确性折衷(SAT)。在每个人类试验中,哔哔声表示所需的反应时间,在显示图像后以固定的延迟发出声音,并且观察者的响应仅在哔哔声附近发生时才计算。在一系列块中,我们测试了许多蜂鸣延迟,即反应时间。我们观察到人类的准确性随反应时间的增加而增加,并继续将其特征与能够推理时间自适应计算的几个动态神经网络的行为进行比较。我们将FLOPS作为反应时间的模拟,我们将网络与人类在曲线拟合误差,类别相关性和曲线陡度中进行比较,并得出结论,级联的动态神经网络是对象识别任务中人类反应时间的有希望的模型。
translated by 谷歌翻译
雅典娜2.0是一家亚历克萨奖的社会奖,这是最后两个Alexa奖奖挑战的决赛。雅典娜成功的一个原因是其新的对话管理战略,它允许它动态构建组件模块的对话和响应,导致每个互动的新型对话。在这里,我们在20/21竞争期间描述了Athena的Alexa奖的系统设计和性能。雅典娜的活跃演示以及视频录音将挑起对话AI的艺术状态的讨论。
translated by 谷歌翻译
动态触发的地震和震颤产生了两类弱的地震信号,它们的检测,认同和身份验证传统上要求进行费力的分析。近年来,机器学习(ML)已成为地球物理分析中的强大效率工具,包括检测时间序列中的特定信号。但是,检测埋在噪声挑战中的弱信号ML算法,部分原因是无处不在的训练数据并不总是可用。在这种情况下,ML可能像人类专家效率低下一样无效。在这一有效性和效率的交汇处,我们利用了过去十年中普及的第三个工具:公民科学。公民科学项目地震侦探利用志愿者的眼睛和耳朵来检测和对潜在动态触发(PDT)事件的地震图中的弱信号进行分类。在这里,我们介绍了地震侦探数据集 - PDT地震和震颤上的一组众包标签。我们应用机器学习来对这些PDT地震事件进行分类,并探索在分离和分类此类信号时面临的挑战。我们确认,使用基于图像和小波的算法,机器学习可以从小地震中检测信号。此外,我们报告说,我们的ML算法还可以检测到PDT震颤的信号,这尚未证明。分类和ML代码的公民科学数据集可在线获得。
translated by 谷歌翻译